Abstract: Accessing heterogeneous metadata can provide deep and meaningful insights. However integrating such data poses a serious problem in terms of duplicates and conflicts. Eliminating those inconsistencies will lead to effective data integration and hence better mining. This paper presents a knowledge repository based data fusion technique that not only eliminated duplicates and conflicts, but also identifies the user’s requirements to provide effective and faster results. The knowledge repository is built based on the user’s feedback and consequent retrievals are made from both the repository and the web in-order to effectively increase the hit ratio. As the repository becomes more mature, retrievals are confined to the repository alone. Experiments conducted depict better accuracies and faster retrieval rates, hence providing an overall high quality of experience for the user.

Keywords: Conflict Identification; Conflict Resolution; Duplicate Identification; Knowledge Repository; Reinforcement Learning.